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Amendments to the Claims : 

This listing of claims will replace all prior versions, and listings, of claims in the 
application: 
Listing of Claims : 

1 . (Currently amended) A method performed by a computer system, the method 
comprising: 

extracting, by a processor of one or more, pro cessors associated with the computer 
system, a set of uniform resource locators (URLs) from one document or from multiple 
documents associated with a single web host; 

identifying, by t h e-- proees s of one or more processors associated with the computer 
system , sub-strings occurring in multiple URLs in the set of URLs as session identifiers, based 
on a particular rule and based on the sub-strings occurring in multiple URLs of the set of URLs; 

generating, by the -proees^or one or more processors associated with the com puter system, 
a clean set of URLs from the set of URLs by removing the session identifiers; and 

determining, by the proc esses* one or more processors associated with the computer 
system , when at least one particular URL has already been crawled based on a comparison of the 
particular URL to the clean set of URLs. 

X (Canceled) 

3. (Previously presented) The method of claim 1 , where the document or each of the 
multiple documents is a web document downloaded from a web site. 
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4. (Previously presented) The method of claim 1 , where the comparison of the 
particular URL to the clean set of URLs comprises calculating a fingerprint value for a particular 
URL and for each of the URLs in the clean set of URLs, and where the comparison 

is based on a comparison of the fingerprint value of the particular URL to the fingerprint values 
of the URLs in the clean set of URLs. 

5. (Previously presented) The method of claim 1 ? where the particular rule 
comprises: 

determining that the sub- strings do not reference content. 

6. (Canceled) 

7. (Previously presented) The method of claim 1 , where the particular rule 
comprises: 

determining that the sub-strings contain characters consistent with a session identifier, 

8. (Previously presented) The method of claim 1 , further comprising: 
downloading content from the particular URL when the particular URL is determined to 

not already have been crawled. 

9. (Previously presented) The method of claim 1, further comprising: 

storing information based on the clean set of URLs for use in later determining whether 
additional URLs have already been extracted; and 
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storing the set of URLs, including embedded session identifiers, for use in later accessing 
the set of URLs. 

10. (Currently amended) A method performed by a computer system, the method 
comprising: 

receiving, by a communication interface e^ aH - iHput-dev - k e - e - f associated with the 
computer system, a set of uniform resource locators (URLs); 

analyzing, by a processor of one or mo re processors associated with the computer system, 
the set of URLs for sub-strings that are structured in a manner consistent with session identifiers; 
and 

further analyzing, by the processor one or more process ors associated with the computer 
system , the set of URLs to identify one of the sub-strings as corresponding to a session identifier 
based on multiple occurrences of the sub-string in the set of URLs. 

1 1 . (Previously presented) The method of claim 1 0, where the set of URLs are 
extracted from a web document associated with a web host. 

12. (Previously presented) The method of claim 10, where the set of URLs are 
extracted from multiple web documents associated with a single web host. 

13. (Previously presented) The method of claim 10, further comprising: 
removing identified session identifiers from the set of URLs; and 

storing the set of URLs, with the removed session identifiers, as a clean set of URLs. 
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14, (Previously presented) The method of claim 13, further comprising: 
adding a generated session identifier to URLs in the clean set of URLs, 

1 5 , (Previously presented) A device comprising: 
a memory to store instructions; and 

a processor to execute the instructions to implement: 

at least one fetch hot to download content on a network from locations specified 
by uniform resource locators (URLs); 

a content manager to: 

extract URLs from the downloaded content, and 

identify session identifiers from the URLs extracted from, the downloaded 
content based, at least in part, on multiple occurrences of the session identifiers 
from a single web site; and 

a URL manager to create clean versions of the URLs extracted from the 
downloaded content by removing the session identifiers from the URLs and to store the 
clean versions of the URLs. 

1 6, (Previously presented) The device of claim 15, where the content manager is 
further to identify the session identifiers based on locating sub-strings, within the URLs extracted 
from the downloaded content, that contain characters consistent with session identifiers. 

1 7, (Previously presented) The device of claim 1 5, further comprising: 
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a database to store the downloaded content. 

1 8. (Previously presented) The device of claim 15, where the URL manager is further 
to determine when additional URLs have previously been stored by comparing clean versions of 
the additional URLs to the stored clean versions of the URLs extracted from the downloaded 
content. 

1 9. (Previously presented) The device of claim 15, where the session identifiers 
include characters from the URLs extracted from the downloaded content that do not reference 
content. 

20. (Currently amended) A deviee system comprising: 
one or more server devices comprising: 

hardwar e implemented means for receiving a set of uniform resource locators 

(URLs); 

h ardwar e 4mp fe mented means for analyzing the set of URLs for sub-strings that 
are structured in a manner consistent with session identifiers; and 

hardw are- implemented means for further analyzing the set of URLs to identify 
one of the sub-strings as corresponding to a session identifier based on multiple occurrences of 
the sub-string in the set of URLs. 

21 . (Currently amended) The device system of claim 20 5 where the set of URLs are 
extracted from a web document associated with a web host. 
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22. (Currently amended) The deviee system of claim 20, where the set of URLs are 
extracted from multiple web documents associated with a single web host, 

23. (Currently amended) The d e vic e system of claim 20, further comprising: 
means for removing the identified session identifiers from the set of URLs; and 
means for storing the set of URLs with the removed session identifiers as a clean set of 

URLs. 

24. (Currently amended) The devic e system of claim 23 9 further comprising: 
means for adding a generated session identifier to URLs in the clean set of URLs. 

25. (Currently amended) One or more memory devices that include programming 
instructions that when executed executable by atieast-ene^eeessef one or more processors, the 
one or more memory devices eaus e s - th e-a£4ea st one processor to perform a metho d including: 

one or more instructions to re ceiving extract a set of uniform resource locators (URLs) 
from one document or from multiple documents associated with a single web host ; 

one or more instructions to analyzin g identify, in the set of URLs^ [[for]] sub-strings that 
arc structured in a manner consistent wi t h session identifie rs contain at least a particular number 
of characters or have at least a particular measure of randomness ; and 

one or more instructions to further m&fyzmg identify, in the identified sub-strings, t h e set 
of-IJRfes to identify one of the sub-strings as corresponding to a session identifier based on 
multiple occurrences of the sub-string in the set of extracted URLs. 
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26-27. (Canceled) 

28. (Currently amended) The one or more memory devices of claim 25, wbe^-fee 
programming instructions further include programming instructions that cause the at least one 
p^eesserte further comprising: 

one or more instructions to remove the session identifiers identifier from the set of URLs; 

and 

one or more instructions to store the set of URLs with the removed session identifier 
identifier as a clean set of URLs. 

29. (Currently amended) The one or more memory devices of claim 28, where th e 
pregf - amm - mg instructions further include programming instructions that cause th e-afc4eas£^ne 
preeeeee^4e further comprising : 

one or more instructions to add a generated session identifier to URLs in the clean set of 
URLs when the URLs are to be used to access a web document. 

30. (New) The method of claim 1 ? where the particular rule comprises: 
determining that the sub-strings exhibit at least a particular measure of randomness. 

3L (New) The method of claim 10, where analyzing the set of URLs for sub-strings 
that are structured in a manner consistent with session identifiers includes identifying sub-strings 
that have at least a particular measure of randomness. 
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32. (New) The device of claim 1 5, where identifying session identifiers from the 
URLs extracted from the downloaded content is further based on identifying sub-strings that 
exhibit at least a particular measure of randomness. 

33. (New) The system of claim 20, where the means for analyzing the set of URLs 
for sub-strings that are structured in a manner consistent with session identifiers comprise means 
for identifying sub-strings that have at least a particular measure of randomness. 
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